Welcome to the world of R. Firstly, you’ll need access to R. You can download R and R studio here. There are two steps to installation. First you need to install R, and second you need to install RStudio.
R and RStudio are not the same thing:
Over the last decade, R has become the “go to” tool to help carry out data analysis in psychological research. R is free and open source. Coding is a highly desirable and transferable skill. However, you don’t need to become an advanced genius coder. Gaining an understanding of how coding works to help you organise, analyse, and present data will be enough for a psychology undergraduate degree. There are lots of reasons to use R in psychological research and you can read about more here
When you open RStudio you should see a window that looks like this:
You will see three sections:
The console is the largest panel on the left. This is where R will produce any written output for you to read and make sense of – almost like a printer.
The environment is the top right panel is where R keeps a list of any data you are working with- almost like R’s memory.
The files panel on the bottom right does a few things, as it has a few different tabs. I’ll talk through the most commonly used tabs:
You can use “projects” in R which can help keep your work tidy and organised. It also means you can save your work and come back to it any time in the future. If you are a PS2010 student at Royal Holloway, I’d recommend you create one project for all of your PS2010 work in R. Make sure you save it somewhere sensible, for example, if you’re using a campus PC, save it onto your Y:Drive.
How to create a new project:
In the top right corner, find and click on the blue button which says “Project: (None)”.
Select “New Directory”.
Select “New Project”.
Give your project a suitable name. Royal Holloway students, I recommend you call it the module code (e.g., ps1010 or ps2010)“.
Now you are ready to create a new script. Look for this button
() in the
top left corner of your screen, click on it and select “R Script”.
If you have followed those steps correctly, a new panel should open up in RStudio.
This new panel on the top left is the “script” panel. This is where you can enter your code – think of the script panel as an input panel. Helpfully, you can save your script at anytime which means you can come back to your code at a later date.
Let’s begin with some very simple coding in the script panel. In your new script panel add the following:
A useful thing to know is anything you write after # (the hashtag symbol) is called a “comment”. This is a way to keep notes that R wont read. R will ignore anything that comes after a hashtag. Think of these as human notes, ignored by the computer!
Annotating your code with comments is a really good habit to get into because it means you have the comment to look at in the future, almost like revision notes to remind you what each line of code does. It means when you come back to re-run or recycle code in the future, you can figure out what it does quickly.
Now let’s get R to work. In the script panel enter the following:
STOP! You might remember from above I mentioned that using comments to annotate your code is best practice. Well straight away I have ignored my own advice. Let’s try that again:
I suppose it is easy to work out what date() does without the comment. But as your code gets more complicated, using comments will become so important!
How to run the code:
There are a few ways to run code in R. The easiest is probably to make sure your cursor is at the end of the line you’ve just written and then press CTRL + ENTER if you’re on a Windows PC or COMMAND + ENTER if you’re on a Mac. Give this a go now and R should tell you the date (check the console – bottom left panel).
Why not have a go at asking R something else. R is essentially a very sophisticated and fancy calculator, but let’s try some basic and more complex maths sums to get you used to running lines of code. Try out the sums below and run each one or add any sum you like
Just a note on symbols for maths questions:
+ will add.
- will subtract.
* will multiply.
/ will divide.
Who doesn’t love a compliment? Let’s create a random compliment generator in R. Enter and run the code below to receive a compliment:
compliments = c("You're awesome!", "You're a coding superstar!",
"Keep on slaying this workshop!", "You’re the best")
random_compliment = sample(compliments, 1)
print(random_compliment)If you want to know what each part of the compliment generator code does, I’ll explain below:
compliments = c() created a vector, in this case a
small data set with four compliments.
random_compliment = sample(compliments, 1) asked R
to create an object in R which sampled one of the compliments from the
vector.
print(random_compliment) printed the sampled
compliment.
Copy and paste the code for the random compliment generator and adapt it to create a random insult generator! Playing around with and tweaking code is a fun way to improve your skills.
compliments to insults.random_compliment to
random_insultsample(compliments) to
sample(insults)print(random_compliment) to
print(random_insult)PS1010 students: That’s it for today’s workshop! Remember to save your R project and script. Check where your project has been saved on your Y: Drive so you know where to find it in the future. We will re-visit this project next week. You will also need these resources for the quiz.
In the previous section you learnt what RStudio looks like and how to
write and execute some fairly basic lines of code. Now it is time to
import some data into R to begin working with it. RStudio likes a
particular format of data file. This is a comma separate values forma or
.csv. You can save excel files as a .csv file.
There are also some other important rules about how data should be laid
out in an excel file, but we will come back to those later.
For now, the aim is to get a pre-existing data set into R and to produce a simple graph to display the data.
R is very particular. This means when you enter code, it needs to be entered perfectly with symbols and letters placed exactly where they should be. R Code is also case sensitive!
Most error messages you see when first learning to code tend to come from typos. Do take your time and make sure that the code has been copied exactly as it should be – any symbol or space in the wrong place will make R upset.
A helpful first step is to load any packages that you might need.
Packages are essentially “add-ons” that you can use within R. There are
loads of packages out there that will allow you to carry out various
tasks in R. For now we need one called tidyverse. Use the
code below:
install.packages() is a command used to ask RStudio to
install a package you want.library() is how we ask RStudio to load up the package
(to open it in R). You might see some error messages but these are just
warnings about which version of R was used to create the
package(s).The working directory can be any folder on your computer. It’s important to know which folder this is, as this is where RStudio will look for any data files. You can check the working directory at anytime using:
Give that a try now. As you followed the previous steps of “Creating a Project” earlier on (scroll up!), hopefully your working directory is already set to that same project directory. However, if not, or if you want to change your working directory, you can do so following these steps:
Click the “Session” tab along the top of the RStudio window.
Click “Set Working Directory”.
Click “Choose Directory”.
A window will pop up and you need to select the folder on your computer that you want to set as your working directory.
You can double check this has worked by using getwd() at
anytime.
It is vital you know where the working directory is, as any external data files you use must be saved into the working directory.
For today’s example you need to download the corr.csv
data set. PS1010 students can find this on Moodle or use this link. Make
sure the file is saved into your working directory. Make sure the file
is saved as a .csv file (you can open the file to look at it, in fact I
encourage you to do this, but do not save it again as a different file
type, it must be .csv)
R likes to work with a particular data file. When we finish an experiment and we have collected all the data, we need to store that data somewhere—usually an excel spreadsheet. R prefers these spreadsheets to be saved as a .csv (comma separated values) file type. You don’t need to change the file this time around as it is already a .csv file, but for the future it is really easy to save any excel file as a .csv. If you want to know how to do this, take a look here.
The next step is to get the data from the .csv file into
RStudio. Providing you have downloaded the file correctly, and saved it
in .csv format in your working directory, the following
code should work:
If successful, you should now be able to see mydata
appear in the top right panel, under the environment tab.
This means your data is now in RStudio.
Use this code to manually add the data as a data frame.
# Create the corr.csv data frame manually without using read_csv
mydata <- data.frame(time_outside = c(120, 101, 85, 55, 41, 22, 123, 90, 66, 32,
12, 130, 90, 50, 33, 70, 65, 54, 111, 24, 11, 115, 9, 80, 129), happiness = c(8,
7, 6, 5, 4, 1, 10, 5, 6, 4, 5, 6, 4, 3, 5, 7, 7, 6, 8, 3, 4, 10, 2, 6, 10))First it might be useful to know what the data set consists of. As I’m writing this it is nearly spring (2024) and I’m looking forward to spending more time outside. It makes me happy. Well, anyway, the data set for this exercise contains two variables:
time_outside.happiness.Our question today is does spending more time outside make you happier?
A scatter plot is a type of graph that shows a potential relationship between two variables, for example, as one thing increases does another thing also increase? Or maybe there is no relationship at all? We can graphically look at this using a scatter plot. Let’s plot our time outside and happiness data. Below is the basic code.
ggplot(data, aes(x = x_var, y = y_var)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(title = "Scatter Plot with Line of Best Fit",
x = "X-axis Label",
y = "Y-axis Label")data is where you can tell RStudio the name of your
data set. Earlier we called it mydata so this will need
changing.x_var is where you should enter the name of your X-axis
variable exactly as it appears in the .csv file. It should be
time_outside.y_var is where you should enter the name of your y-axis
variable exactly as it appears in the .csv file. It should be
happiness.x = "X-axis Label" and y = "Y-axis Label”
is where you tell RStudio what text labels you want to display on each
axis. For the x axis I’d go for
"Time Spend Outside Per Week (mins)” and for the y axis I’d
go for “Happiness Score” as that matches the first line of
the code where you specified the x and y axis (above). Make sure you use
speech marks as this is a piece of text that will be shown on the
graph.labs(title = "Scatter Plot with Line of Best Fit", is
also where your figure title appears. You can amend this to whatever is
appropriate, or remove it, for example if you’re presenting an APA
formatted figure you wouldn’t need a title.Have a go at amending the above code yourself to match the data set. Once you’ve done this, reveal the correct answer below.
Remember that R code is case sensitive and you have to write things exact to match the.csv file. Remember the golden rule!
ggplot(mydata, aes(x = time_outside, y = happiness)) + geom_point() + geom_smooth(method = "lm",
se = FALSE, color = "black") + labs(title = "Scatter Plot with Line of Best Fit",
x = "Time Spend Outside Per Week (mins)", y = "Happiness Score")If you want the figure without the title use this code below, can you notice which line has been removed so that the title no longer shows?
Once you have amended the code, run it and check out the “Plots” tab in the bottom right panel of RStudio.
PS1010 students: That’s it for today’s workshop! Remember to save your R project and script. Check where your project has been saved on your Y: Drive so you know where to find it in the future. You will need these resources for the quiz.